12 research outputs found
Query evaluation revised: parallel, distributed, via rewritings
This is a thesis on query evaluation in parallel and distributed settings, and structurally simple rewritings.
It consists of three parts.
In the first part, we investigate the efficiency of constant-time parallel evaluation algorithms. That is, the number of required processors or, asymptotically equivalent, the work required to evaluate queries in constant time. It is known that relational algebra queries can be evaluated in constant time. However, work-efficiency has not been a focus, and indeed known evaluation algorithms yield huge (polynomial) work bounds. We establish work-efficient constant-time algorithms for several query classes: (free-connex) acyclic, semi-join algebra, and natural join queries; the latter in the worst-case framework.
The second part is about deciding parallel-correctness of distributed evaluation strategies: Given a query and policies specifying how data is distributed and communicated among multiple servers, does the distributed evaluation yield the same result as the classical evaluation, for every database? Ketsman et al. proved that parallel-correctness for Datalog is undecidable; by reduction from the undecidable containment problem for Datalog. We show that parallel-correctness is already undecidable for monadic and frontier-guarded Datalog queries, for which containment is decidable. However, deciding parallel-correctness for frontier-guarded Datalog and constraint-based communication policies satisfying a certain property is 2ExpTime-complete. Furthermore, we obtain the same bounds for the parallel-boundedness problem, which asks whether the number of required communication rounds is bounded, over all databases.
The third part is about structurally simple rewritings. The (classical) rewriting problem asks whether, for a given query and a set of views, there is a query, called rewriting, over the views that is equivalent to the given query. We study the variant of this problem for (subclasses of) conjunctive queries and views that asks for a structurally simple rewriting. We prove that, if the given query is acyclic, an acyclic rewriting exists if there is any rewriting at all. Analogous statements hold for free-connex acyclic, hierarchical, and q-hierarchical queries. Furthermore, we prove that the problem is NP-hard, even if the given query and the views are acyclic or hierarchical. It becomes tractable if the views are free-connex acyclic or q-hierarchical (and the arity of the database schema is bounded)
Decision Problems for Subclasses of Rational Relations over Finite and Infinite Words
We consider decision problems for relations over finite and infinite words
defined by finite automata. We prove that the equivalence problem for binary
deterministic rational relations over infinite words is undecidable in contrast
to the case of finite words, where the problem is decidable. Furthermore, we
show that it is decidable in doubly exponential time for an automatic relation
over infinite words whether it is a recognizable relation. We also revisit this
problem in the context of finite words and improve the complexity of the
decision procedure to single exponential time. The procedure is based on a
polynomial time regularity test for deterministic visibly pushdown automata,
which is a result of independent interest.Comment: v1: 31 pages, submitted to DMTCS, extended version of the paper with
the same title published in the conference proceedings of FCT 2017; v2: 32
pages, minor revision of v1 (DMTCS review process), results unchanged; v3: 32
pages, enabled hyperref for Figure 1; v4: 32 pages, add reference for known
complexity results for the slenderness problem; v5: 32 pages, added DMTCS
metadat
Work-Efficient Query Evaluation with PRAMs
The paper studies query evaluation in parallel constant time in the PRAM model. While it is well-known that all relational algebra queries can be evaluated in constant time on an appropriate CRCW-PRAM, this paper is interested in the efficiency of evaluation algorithms, that is, in the number of processors or, asymptotically equivalent, in the work. Naive evaluation in the parallel setting results in huge (polynomial) bounds on the work of such algorithms and in presentations of the result sets that can be extremely scattered in memory. The paper first discusses some obstacles for constant time PRAM query evaluation. It presents algorithms for relational operators that are considerably more efficient than the naive approaches. Further it explores three settings, in which efficient sequential query evaluation algorithms exist: acyclic queries, semi-join algebra queries, and join queries - the latter in the worst-case optimal framework. Under natural assumptions on the representation of the database, the work of the given algorithms matches the best sequential algorithms in the case of semi-join queries, and it comes close in the other two settings. An important tool is the compaction technique from Hagerup (1992)
Rewriting with Acyclic Queries: Mind Your Head
The paper studies the rewriting problem, that is, the decision problem whether, for a given conjunctive query Q and a set ? of views, there is a conjunctive query Q\u27 over ? that is equivalent to Q, for cases where the query, the views, and/or the desired rewriting are acyclic or even more restricted.
It shows that, if Q itself is acyclic, an acyclic rewriting exists if there is any rewriting. An analogous statement also holds for free-connex acyclic, hierarchical, and q-hierarchical queries.
Regarding the complexity of the rewriting problem, the paper identifies a border between tractable and (presumably) intractable variants of the rewriting problem: for schemas of bounded arity, the acyclic rewriting problem is NP-hard, even if both Q and the views in ? are acyclic or hierarchical. However, it becomes tractable, if the views are free-connex acyclic (i.e., in a nutshell, their body is (i) acyclic and (ii) remains acyclic if their head is added as an additional atom)
Rewriting with Acyclic Queries: Mind Your Head
The paper studies the rewriting problem, that is, the decision problem
whether, for a given conjunctive query and a set of views,
there is a conjunctive query over that is equivalent to ,
for cases where the query, the views, and/or the desired rewriting are acyclic
or even more restricted. It shows that, if itself is acyclic, an acyclic
rewriting exists if there is any rewriting. An analogous statement also holds
for free-connex acyclic, hierarchical, and q-hierarchical queries. Regarding
the complexity of the rewriting problem, the paper identifies a border between
tractable and (presumably) intractable variants of the rewriting problem: for
schemas of bounded arity, the acyclic rewriting problem is NP-hard, even if
both and the views in are acyclic or hierarchical. However,
it becomes tractable if the views are free-connex acyclic (i.e., in a nutshell,
their body is (i) acyclic and (ii) remains acyclic if their head is added as an
additional atom)
Parallel-Correctness and Parallel-Boundedness for Datalog Programs
Recently, Ketsman et al. started the investigation of the parallel evaluation of recursive queries in the Massively Parallel Communication (MPC) model. Among other things, it was shown that parallel-correctness and parallel-boundedness for general Datalog programs is undecidable, by a reduction from the undecidable containment problem for Datalog. Furthermore, economic policies were introduced as a means to specify data distribution in a recursive setting. In this paper, we extend the latter framework to account for more general distributed evaluation strategies in terms of communication policies. We then show that the undecidability of parallel-correctness runs deeper: it already holds for fragments of Datalog, e.g., monadic and frontier-guarded Datalog, with a decidable containment problem, under relatively simple evaluation strategies. These simple evaluation strategies are defined w.r.t. data-moving distribution constraints. We then investigate restrictions of economic policies that yield decidability. In particular, we show that parallel-correctness is 2EXPTIME-complete for monadic and frontier-guarded Datalog under hash-based economic policies. Next, we consider restrictions of data-moving constraints and show that parallel-correctness and parallel-boundedness are 2EXPTIME-complete for frontier-guarded Datalog. Interestingly, distributed evaluation no longer preserves the usual containment relationships between fragments of Datalog. Indeed, not every monadic Datalog program is equivalent to a frontier-guarded one in the distributed setting. We illustrate the latter by considering two alternative settings where in one of these parallel-correctness is decidable for frontier-guarded Datalog but undecidable for monadic Datalog
Decision Problems for Subclasses of Rational Relations over Finite and Infinite Words
1 full d’un mapa en fulls, còpies fotogrà fiques, b/n. - La col·lecció de la Cartoteca de la Universitat de Girona consta de més de 2.000 fulls, amb i sense toponÃmia. - La data és la del vol. - La numeració segueix el grà fic de fulls amb la informació fila/columna.60 x 30 cm cada full1:5 00